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Abstract 

We propose to interpret distribution model risk as sensitivity of ex- 
pected loss to changes in the risk factor distribution, and to measure 
the distribution model risk of a portfolio by the maximum expected 
loss over a set of plausible distributions defined in terms of some diver- 
gence from an estimated distribution. The divergence may be relative 
entropy, a Brcgman distance, or an /-divergence. We give formulas for 
the calculation of distribution model risk and explicitly determine the 
worst case distribution from the set of plausible distributions. We also 
give formulas for the evaluation of divergence preferences describing 
ambiguity averse decision makers. 
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1 The problem of model risk 



Financial risk measurement, pricing of financial instruments, and portfolio 
selection are all based on statistical models. If the model is wrong, risk 
numbers, prices, or optimal portfolios are wrong. Model risk quantifies the 
consequences of using the wrong models in risk measurement, pricing, or 
portfolio selection. 

The two main elements of a statistical model in finance are a risk fac- 
tor distribution and a pricing function. Given a portfolio (or a financial 
instrument), the first question is: On which kind of random events does 
the value of the portfolio depend? The answer to this question determines 
the state space fij^ A point r G is specified by a collection of possible 
values of the risk factors. The state space codifies our lack of knowledge 
regarding all uncertain events affecting the value of a given instrument or 
portfolio. The specification of a distribution class and some parameter es- 
timation procedure applied to historical data determines some best guess 
risk factor distribution, call it ¥q. The second central element is a pricing 
function X : — )• M describing how risk factors impact the portfolio value 
at some given future time horizon. We work in a one-stage set-up. Often 
modellers try to use only risk factors which are (derived from) prices of ba- 
sic financial instruments. Describing the price of the portfolio as a function 
of the prices of these basic instruments is a modelling exercise, which is 
prone to errors. It involves asset pricing theories of finance with practically 
non-trivial assumptions on no arbitrage, complete markets, equilibrium, etc. 
Together the risk factor distribution and the pricing function determine the 
profit loss distribution. In a last step, a risk measure associates to the profit 
loss distribution a risk number describing a capital requirement. 

Corresponding to the two central elements of a statistical model we dis- 
tinguish two kinds of model risk: distribution model risk and pricing model 
risk. This paper is concerned with distribution model ri sk^ (F or an inter- 



esting approach to pricing model risk we refer to Cont 2006].) Although 
Pq is a best guess of the risk factor distribution, one is usually aware that 
due to model specification errors or estimation errors the data generating 
process might be different from Pq. Distribution model risk should quantify 



'^It is possible to choose a larger state space including variables which do not affect the 
value of the given portfolio. This could allow to compare different portfolios, which do not 
all depend on the same risk factors. Typically modellers try to keep the number of risk 
factors small and therefore use a smaller state space. With various techniques they try 
model some risk factors as a function of a smaller set of risk factors. Thus the number of 
risk factors actually used in the model, although it may go into the thousands, is typically 
mu ch smaller than the number of variables influencing the loss. 

uses the term model risk for what we call distribution model risk. For a 



Gibson 



2000 



first classification of model risks we refer to Crouhy et al. [1998 . Distribution model risk 



encompasses both estimation risk and misspecification risk in the sense of [Kerkhof et al.] 



2010 , but here we do not need to distinguish the two. 
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the consequences of working with Pq instead of the true but unknown data 
generating process. We propose to measure distribution model risk by 



MR := - inf Ef>(X) (1) 
Per 

where T is some set of plausible alternative risk factor distributions. So 
MR is the negative of the worst expected value which could result if the risk 
factor distribution is some unknown distribution in T. We propose to choose 
for r balls of distributions, defined in terms of some divergence, centered at 
Po: 

r = {P : Z)(P||Po) < /c}, (2) 

where the divergence D could be the relative entropy (synonyms: Kullback- 
Leibler distance, /-divergence), some Bregman distance, or some /-divergen- 
ce, r contains all risk factor distributions P whose divergence from Pq is 
smaller than some radius k > 0. The parameter k has to be chosen by hand 
and describes the degree of uncertainty about the risk factor distribution. 
For larger values of k the set of plausible alternative distributions is larger, 
which is appropriate for situations in which there is more model uncertainty. 
In Section [3] we give the definitions of various divergences and discuss the 
choice of divergence D. 

In a previous paper (Breuer and Csiszar [2012] ) we have addressed the 



problem ([T]) for the special case where D is the relative entropy, assuming 
some regularity conditions which ensure the worst case distribution solving 
([T]) is from some exponential family. The present paper first extends those 
results, giving the solution for the pathological cases when these regularity 
conditions are not met (Section [s]). Second, as main mathematical result, 
we provide the solution to Problem ([T]), including the characterization of 
the minimiser when it exists, for F of the form ([2]) defined in terms of a 
convex integral functional (Section |6]). The special cases of Bregman balls 
and /-divergence balls are treated in Section [7j Finally, in Section [8] we will 
address the related, mathematically simpler, problem 

W ■.= mi[Ep{X) + \D{F\\Fo)], A > 0. (3) 

Decision makers with divergence preferences rank alternatives X by this 
criterion. We apply the methods of Section |6] to derive an explicit solution 
for the divergence preference problem ([3|. 

Mathematically, our approach will be to exploit the relationship of Prob- 
lem ([T]) to that of minimizing convex integral functionals (and specifically 
relative entropy) under moment constraints. The tools we need do not go 
beyond convex duality for M and M^, and many results directly follow from 
known ones about the moment problem. While in the literature attention 
is frequently restricted to essentially bounded X, here the Po-integrability 
of X suffices. 
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2 Relation to the literature 



Problem ([T]) has been addressed in the Uterature in two related contexts: 
coherent risk measures and ambiguity. Law-invariante risk measures assign 
to a profit loss distribution a number interpreted as risk capital. |Artzner 
et al. [1999 and Follmer and Schied 2004 formulated requirements for risk 
measures and coined the terms 'coherent' resp. 'convex' for risk measures 
fulfulling them. Every coherent risk measure can be represented as ([T]) for 
some closed convex set T of probabilities]^ The risk capital required for a 
portfolio is the worst expected loss over the set F. 

In the context of risk measurement, the model risk measure ([T]) is yet 
another coherent risk measure. Defining the risk measure by the set T via 
the representation ([T]) is natural when addressing distribution model risk. 
Risk measures defined in terms of the profit loss distribution, like Value 
at Risk or Expected Shortfall, rely on a specific distribution model, which 
may be misspecified or misestimated. For a fixed portfolio, represented by a 
pricing function X, a different risk factor distribution gives rise to a different 
profit loss distribution, and therefore to a different risk capital requirement. 
Expression ([T]) measures exactly this model dependence 

On the other hand. Problem ([T]) describes ambiguity averse preferences: 
A widely used class of preferences allowing for ambiguity aversion are the 
multiple priors preferences, also known as maxmin expected utility prefer- 
ences, axiomatised by Gilboa and Schmeidler 1989j p] (Another description 
of ambiguity aversion are the divergence preferences ^3]).) Agents with mul- 
tiple priors preferences choose acts X with higher worst expected utility, 
where the worst case is taken over a closed convex set set F of finitely ad- 
ditive probabilities. The set F is interpreted as a set of priors held by the 
agent, and ambiguity is reflected by the multiplicity of the priors. Inter- 
preting the choice of a portfolio as an act, the risk measure representation 



The representation theorem is due to 



for general probability spaces see Delbaen 



Artzner et al 



2002 



or 



1999 



for finite sample spaces, 
Follmer and Schiedl 120021. Its formal 



statement is not needed for our purposes. 

''One could object that Expected Shortfall is coherent and therefore can be represented 
by eq. ([T| as a maximum expected loss over some set F of alternative distribution models. 
The set V equals {P = Po[.|A] : Po(^) > a}, which contains distributions so different from 
Po that they are hardly plausible to arise from the same historical data by estimation or 
specification errors. Or, one could represent expected shortfall by eq. ([T]) with F as in ([2|, 
taking D = Dj as in ([7| below, with the pathological convex function / equal to in the 
interval [0, 1/a] and +oo otherwise [Follmer and Schied 2004 Theorem 4.47]. But this / 
does not meet the assumptions in Section |3| and the corresponding Df is not a divergence 
in our sense. 



■■ Gilboa and Schmeidler 1989 worked in the setting of Anscombe and Aumann 1963 
using lottery acts. Casadesus-Masanell et al. [2000 translated their approach to Savage 
acts. In the Gilboa-Schmeidler theory the utility of outcomes occurs separately, whereas 
in our notation the utility is part of the function X, which we would interpret as the utility 
of outcomes. 
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([T]) and the multiple priors preference representation agree, see Follmer and 



Schied [2002 . A decision maker who ranks portfolios by lower values of 
some coherent risk measure displays multiple priors preferences. And vice 
versa, a decision maker with multiple priors preferences acts as if she were 
minimising some coherent risk measure. 

In the context of the Gilboa-Schmeidler theory, our results provide ex- 
plicit expressions for the decision criterion of ambiguity averse decision mak- 
ers, in the special case that the priors set F is given by Q. Choosing the 
same T for all agents may be at odds with a descriptive view of real agents' 
preferences. But from a normative point of view our choice of F in ([2]) is 
motivated by general arguments (Section [s]). Our results can serve as a 
starting point for the further analysis of portfolio selection and contingent 
claim pricing under model uncertainty, extending, among others, work of 

2002a|b| , | Calafiore| (20071 • 

with D{F\\Fo) equal to 

relative entropy, has been proposed by Hansen and Sargent] [2001 , see also 

also 



Avellaneda and Paras 1996 , Friedman 



In the present context, the choice of F by 



Ahmadi-Javid [2011 and Breuer and Csiszar 



2012 



Friedman 



2002a 



used relative entropy balls as sets of possible models. [Hansen and Sargent 
[2001[ |2007| |2008| , [Barillas et"aL| [2009] and others have used a relative 
entropy-based set of alternative models. Their work is set in a multiperiod 
framework. It deals with questions of optimal choice, whereas we take the 
portfolio X as given. Maccheroni et al. 2006[ presented a unified framework 
encompassing both the multiple priors preference ([T]) and the divergence 
preferences ([s]). They proposed to use weighted /-divergences, which are 
also covered in our framework. Ben-Tal and Teboulle| [2007 Theorem 4.2] 
showed that their optimised certainty equivalent for a utility function u can 
be represented as divergence preference ([s]) with D equal to the /-divergence 
with the function / satisfying u{x) = — /*(— x). For both, the worst case 
solution is a member of the same generalised exponential family. This paper 
makes clear the reasons. 

Finally but importantly, the work of Ahmadi-Javid [2011 has to be cited 
for solutions of ([l]) and ([sj), in case of relative entropy and of /-divergences, 
in the form of convex optimization formulas involving two real variables (one 
in the case of relative entropy) . The relationship of these results to ours will 
not be discussed here but we mention that in Ahmadi-Javid [2011 the patho- 
logical cases for relative entropy treated in Section [5] were not addressed, and 
the results for /-divergences were obtained under the assumptions that / is 
cofinite and X is essentially bounded. 
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3 Measures of plausibility of alternative risk factor 
distributions 



We define divergences between non- negative functions on the state space 
wliich may be any set equipped with a cr-algebra not mentioned in the se- 
quel, and with some measure ^ on that u-algebra. Here may or may 
not be a probability measure. Then the divergence between distributions 
(probability measures on 0) absolutely continuous with respect to fi is taken 
to be the divergence between the corresponding density functions. In our 
terminology, a divergence is non-negative and vanishes only for identical 
functions or distributions. (Functions which are equal /x-a.e. are regarded 
as identical.) A divergence need not be a metric, may be non-symmetric, 
and the divergence balls need not form a basis for a topology in the space 
of probability distributions. 

The relative entropy of two non-negative functions p, po is defined as 



I{p\\Po) 



\p[r) log — - 
n Poir) 



p{r) + po{r)]dn{r). 

If p,po are //-densities of probability distributions F,Fq this reduces to the 



original definition of Kullback and Leibler 1951 , 



'o) = / ^og—{r)dFir) if P <C Pq. 

If a distribution P is not absolutely continuous with respect to Pq, take 
/(P||Po) = +OO0 

|1967|, and /-divergences. 



Bregman distances, introduced by Bregman 



introduced by [Csiszlr [1963 1967 , and Ali and Silvey [1966 , are classes of 
divergences parametrised by convex functions / : (0, oo) — M, extended to 
[0, do) by setting /(O) := liuit-^Q f (t) . Below, / is assumed strictly convex 
but not necessarily differentiable. 

The Bregman distance of non- negative (measurable) functions p,po on 
with respect to a (finite or cj-finite) measure on J7 is defined by 



(4) 



Bf,f,{p,Po) := / Af{p{r),po{r))n{dr), 



where, for s, t in [0, -|-oo) 



Af{s,t) :-- 



f{s) - fit) - fit){s -t) if t > or t = 0, /(O) < +00 



s ■ {+oo) 



if i = and /(O) = +oo. 



(5) 



®Note that /(P || Po) is a less frequent notation for relative entropy than Z)(P jj Pq), it 
has been chosen here because we use the latter to denote any divergence. 
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If the convex function / is not differentiable at t, the right or left derivative 
is taken for f'(t) according as s > t oic s < t. 

The Bregman distance of distributions P <C ^, Pq /^^ is defined by 



B 



B 



(6) 



Clearly, -B/,/^ is a bona fide divergence whenever / is strictly convex in 
(0, +00). For /(s) = slogs — s + 1, Bf \s the relative entropy /. For 
/(s) = — logs, Bf is the Itakura-Saito distance. For /(s) = s^, Bf is the 
squared L^-distance. 

The /-divergence between non-negative (measurable) functions p and pQ 
is defined, when / additionally satisfies /(s) > /(I) = oj^by 



Df{p\\po) 



f 



p{r 



Po{r) 



po{r) fi{dr) 



(7) 



At places where po(^) = 0, the integrand by convention is taken to be 
p{r) liijis^oo f (s) / s . The /-divergence of distributions F <^ fi,Fo <^ fi, de- 
fined as the /-divergence of the corresponding densities, does not depend on 
H and is equal to 



D 



f 



dFo + Fs{n) lim 



fis) 



(8) 



where Pa and P^ are the absolutely continuous and singular components of 
P with respect to Pq- Note that if / is cofinite, i.e., if the limit in ([s]) is +c«, 
then P <C Pq is a necessary condition for the finiteness of Z)j(P||Po), while 
otherwise not. 

For /(s) = s log s — s + 1, Df is the relative entropy. For /(s) = — log s + 
s + 1, Df is the reversed relative entropy. For /(s) = {^/s — 1)^, Df is the 
squared Hellinger distance. For /(s) = (s — 1)^/2, Df is the relative Gini 
concentration index. For more details about /-divergences see Liese and 



Vajda 1987 



Relative entropy appears the most versatile divergence measure for prob- 
ability distributions or non-negative functions, extensively used in diverse 
fields including statistics, information theory, statistical physics, see e.g. 

For its applica- 



Kullback 1959 , Csiszar and Korner 



2011 



Jaynes 



1957 



tions in econometrics, see Golan et al. 1996| or Grechuk et al.| [2009 . In the 



context of this paper, Hansen and Sargent 2001 have used expected value 
minimization over relative entropy balls. Arguments for ([2]) with any /- 
divergence in the role of D, or more generally with a weighted /-divergence 
involving a (positive) weight function w{r) in the integral in Q, have been 



''This makes sure that ([7| indeed defines a divergence between any non-negative func- 
tions; if attention is restricted to probability densities resp. probability distributions, it 
sufflces to assume that /(I) — 0. 
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put forward by Maccheroni et al. [2006 . Results of Ahmadi-Javid [2011 



indicate advantages of relative entropy over other /-divergences also in this 
context. In another context, Grunwald and Dawid 2004 argue that dis- 
tances between distributions might be chosen in a utility dependent way. 
Relative entropy is natural only for decision makers with logarithmic utility. 
Picking up this idea, for decision makers with non-logarithmic utility one 
might define the radius in terms of some utility dependent distance. We are 
unaware of references employing Q with Bregman distances, although this 
would appear natural, particularly as Bregman distances have a beautiful 
interpretation as measuring the expected utility losses due to the convexity 
of/. 

In the context of inference, the method of maximum entropy (or relative 



entropy minimization) is distinguished by axiomatic considerations. Shore 



and Johnson |1980| , Paris and Vencovska [1990 , and Csiszar 1991| showed 



that it is the only method that satisfies certain intuitively desirable pos- 
tulates. Still, relative entropy cannot be singled out as providing the only 
reasonable method of inference. Csiszar 11991 determined what alterna- 



tives (specifically, Bregman distances and /-divergences) come into account 
if some postulates are relaxed. In the context of measuring risk or evaluat- 
ing preferences under ambiguity aversion, axiomatic results distinguishing 
relative entropy or some other divergence are not available. 

An objection against the choice of the set F in ([2]) with D equal to rela- 
tive entropy or a related divergence should also be mentioned. It is that all 
distributions in this set are absolutely continuous with respect to Pq- In the 
literature of the subject, even if not working with divergences, it is a rather 
common assumption that the set of feasible distributions is dominated; one 
notable exception is Cont [2006 . Sometimes the assumption that F is dom- 
inated is hard to justify. For example, in a multiperiod setting where O is 
the canonical space of continuous paths and F is a set of martingale laws 
for the canonical process, corresponding to different scenarios of volatilities, 
this F is typically not dominated (see Nutz and Soner 2012| ). Or, if we 
use a continuous default distribution ¥o, can we always be sure that the 
data generating process is not discrete? And should it not be possible to 
approximate in some appropriate sense a continuous distribution by discrete 
ones? 

If an /-divergence with a non-cofinite / is used, then the set F of alter- 
native distributions is not dominated, see ^. But since all distributions 
singular to Pq have the same /-divergence from ¥q, even /-divergences with 
non-cofinite / are not appropriate to describe the approximation of a contin- 
uous distribution by discrete distributions. Bregman distances have a similar 
shortcoming. In practice, this objection does not appear a serious obstacle, 
for the set F of theoretical alternatives may be extended by distributions 
close to them in an appropriate sense involving closeness of expectations, 
which negligibly changes the theoretical risk value ([T]). 
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4 Intuitive relation of worst case risk and maxi- 
mum entropy inference 

The purpose of this section is to develop intuition on the relation between 
Problem ([T]) and the maximum entropy problem. Let us consider the mathe- 
matically simplest case of Problem ([T]) , when F is a sufficiently small relative 
entropy ball. Then Problem ([T]) requires the evaluation of 

inf E^{X) =: V{k), (9) 

P:/(P||Pn)<fc 



for sufficiently small k. We follow Breuer and Csiszar 


2012 


, using tech- 


niques familiar in the theory of exponential families, see |Barndorff-Nielsen 


|l978 , and large deviations theory, see 


Dembo and Zeitouni 


1998|. The 



meaning of 'sufficiently small' will be made precise later in this section. The 
cases when the relative entropy ball is not 'sufficiently small' will be treated 
in Section O 

Observe that Problem ([T]) with T a relative entropy ball is "inverse" to a 
problem of maximum entropy inference. If an unknown distribution P had 
to be inferred when the available information specified only a feasible set 
of distributions, and a distribution Pg were given as a prior guess of P, the 
maximum entropjj^ principle would suggest to infer the feasible distribution 
P which minimizes /(P || Pq). In particular, if the feasible distributions were 
those with Ef(X) = b, for a constant b, we would arrive at the problem 

inf /(PllPo). (10) 

Note that the objective function of problem ([T]) is the constraint in the 
maximum entropy problem (10), and vice versa (Fig. [T]). It is therefore 



intuitively expected that (taking k and b suitably related) both problems 
are solved by the same distribution P, 

arg min ^p(X) = arg min I(P||Po)=:P, (11) 

P:/(P||Po)<fc P:Ep(X)=fe 

see Fig. [T| The literature on the maximum entropy problem establishes 
that (under some regularity conditions) the solution P is a member of the 
exponential family of distributions ¥{9) with canonical statistic X, which 
have a Po-density 

dFo ^ ' ■ /e^^WdPo(r) ' ^ ' 

where € M is a parameter and the function A is defined as 

A(^) := log / e^^('^)(fPo(r). (13) 



*This name refers to the special case when Pq is the uniform distribution; then min- 
imising 7(P II Po) is equivalent to maximising the Shannon differential entropy of P. 
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Figure 1: Relation of Worst Case and Maximum Entropy prob- 
lem. What is the objective function in problem Q is the constraint in the 
maximum entropy problem (10), and vice versa. 



Among actuaries the distributions from the exponential family are often 
referred to as Esscher transforms. 

For members ¥(9) of the exponential family, the expected profit can be 
written as 

£;p(,)(X) = I X{r)exp{eX{r) - A(^))(iPo(r) = A' (9), (14) 
and the relative entropy to Fq is 



I(P(^)||Po) = J log ^^{r)dF{9){r) = J {ex {r)-A{9))dF{9){r) 

= 9E^rs){X)-Ai9) = 9A'{9)-A{9). (15) 



//the identity (11) holds and the solution of Problem ^ is from the 
exponential family, then one can determine which member of the exponential 
family solves the problem, by solving the equation 

eA'{9)-A{9) = k (16) 

for 9. Typically, ( |16[ ) has both a positive and a negative solution, and the 
corresponding ¥{9) is the maximiser resp. minimiser of Ep(X) subject to 
/(P||Po) ^ k. Call the negative solution 9. The solution to Problem ^ 
can then be expressed in terms of the A-function: 

inf Ep(X) = inf A'(9) = A'(9), 

P:7(P||Po)<A: B-.BA' {e)-A{e)<k 

(The last equality follows from the convexity of A.) This solution is illus- 
trated in Fig.[2j The worst expected profit V{k) is the slope of the tangent to 
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Figure 2: Solution of the worst case problem problem from the 
A-function. The optimal value achieved for Problem ^ is the slope of the 
tangent to the curve A(^) passing through (0, —k). 9 is the ^-coordinate of 
the tangent point. 



the curve A{9) passing through (0, —k). is the 0-coordinate of the tangent 
point. From the figure it is obvious that OA' (6) — A{6) = k. 

So far the intuition about the solution in what one could call the generic 



case. It requires two important assumptions: Identity (11) should hold and 



the equation (16) should have a (unique) negative solution 9. 



Breuer and 



Csiszar] [2012 give precise conditions under which the solution is indeed of 
the generic form above. The first condition is relevant when X is essentially 
bounded below, the other two when it is not: 

(i) If essinf(X) is finite, k < /cmax := -logPo({'^ : X{r) = essinf(X)}). 

(fi) ^min := mf{6' : A{9) < +00} < 0, 

(ni) If 6*111111, A(6'iiiin), and A'(^min) are aU finite then k < 9^inA'{9ram) - 

A(^iiiin). 

The concepts used above are in close analogy to statistical mechanics. 
The risk factor vector r is the counterpart of the phase space points. The 
pricing function X is the counterpart of the energy function. A is the coun- 
terpart of the logarithm of the partition function Z. 9 is the counterpart 
of the inverse temperature parameter /3 = 1/kT. The worst case distribu- 



tion (12) is the counterpart of the canonical distribution. 
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5 Maximum Loss over relative entropy balls: The 
pathological cases 



Now let us turn to the solution of Problem ([9]) in the pathological case 
where F is a large relative entropy ball, so that one of the conditions (i)-(iii) 
is violated. 

First consider the case that assumption (i) above is violated, where the 
loss is essentially bounded and the sphere is not "sufficiently small". Long 
bond portfolios are examples for this case. In this case equation ( |16[ ) has no 
negative solution. The shape of the A-function is displayed in Fig. [3j 

Proposition 1. //essinf(X) is finite, and k > k^aax (defined in (i), Sec- 
tion^ then the solution to Problem ^ is V{k) = essinf(X). The worst 
case distribution P has the ¥Q-density 

rfP r ifX{r) = essinf(X) 

dPo \ otherwise, 

where (3 = Po({?- : X{r) = essinf(X)}). 
Proof. The distribution P satisfies 



J(P||Po) = 




= -log/3. 



hence I(P || Pq) < A; if A: > - log /?. Then V{k) < Ep{X). Trivially V{k) > 
essinf(X). The claim V(k) = essinf(X) follows. □ 




' " kmax 

Figure 3: The pathological case of Proposition [T} 

Next consider the pathogolical case that assumption (ii) above is violated 
so that ^min = 0, and thus A{9) = +00 for all 9 < 0. 
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Proposition 2. If 9 mm defined in (ii) of Section^ equals zero, then the 
solution to Problem ^ is V{k) = — oo for all k > 0. 

Proof. Let /3m, n := IF'o({'" ■ ^ ^(^) — '^D ^^^d consider the measures 

IPm.n, < Po with 



l/Pm,n if -n < X{r) < m 



dFo { otherwise. 

Obviously I(Pm,n||IPo) = -log/3m,„. For any P < I 



is arbitrarily close to /(P || Pm,n) if ?7i and n are sufficiently large. Hence to 
prove that V{k) = —oo for all A; > 0, it suffices to find to any given m and 
sufficiently large n distributions P <^ Pm,n with I(P || Pm,n) arbitrarily close 
to zero and Ep{X) arbitrarily low. 

In the rest of this proof, m is fixed and n will go to +oo. Define P and 
Am,n by 

(r\ ■= ^ =■ pex(r)-A„,„(e) 



/e«^MdPm,„(r) 



for any 6* < 0. P and Am,n depend on 6. As in ([Ti]), Ep{X) = Mm,n{(^) and 
/(P||Pm,n) = OK'^^^{e) - Am,n{^) for any e<0. For each 6, 



-em> Am,n(e) = / A'^AOd^ ^ -^A:^,n(^), (17) 
Je 

since is increasing. For fixed ^ < 0, Am,n(^) — >• oo as n — )• oo since 

A(^) = oo by assumption. By (17) it follows that A^ „(0) — )• — oo as n — )• oo, 
and hence there exists a sequence 6'„ f such that A^„(0n) — )• — oo and 
OnA'm.ni^n) — 5- as n — )■ OO. By inequality (17), this implies |Am,n(^n)| — ^ 
and hence /(P||Pm,ra) — >• as n — )• oo. This completes the proof that, for P 
defined with 9 = 9n, Ep{X) will be arbitrarily low and I(P || ^m,n) arbitrarily 
small but positive. □ 

Finally consider the case that both A(0jnin) and A'(0inm) are finite, but 
the sphere is not "sufficiently small". The shape of the A-function is dis- 
played in Fig. |4} 

Proposition 3. If — oo < ^min < 0, {9 min is defined in (ii) of Section 
and both A{9mm) and A' {9mm) are finite, and additionally k > 6mmA' {9mm) — 
A{9mm), then 

V{k) = {k + A{9mm))/9mm, (18) 

but there is no distribution achieving this value; the infimum in Problem ([9| 
is not a minimum. 
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MR 



Figure 4: The pathological case of Proposition |3j 



Proof. Define IP(0min) as in (12) with 9a\in in the place of 0. Then 



= I(P||P(0n,in)) + J log{exp{9nnnX{r) - A(0n,in))dP(r) 

= I(P||P(0n,in)) + 0min^p(^) - A(0^in) (19) 

for ah P < P(6'min)- Hence, if /(P || Pq) < k then (using O^in < 0) 

Ep{X) >{k + A(0^in) - /(P||P(^min)))/^min, (20) 

proving that V{k) > (A: + A(6'inin))/^mm- To show that equality holds, apply 
the result of Proposition [2] to P(0min) in the role of Pq, then the role of A{9) 
is played by 

A{9) := log / e^^(^)(iP(0^in)(r) = A{e + 6^,^) - A(^^in). 



Clearly, A{6) = 00 for all 6 < 0, hence by Proposition [2] there exist distribu- 
tions P' with /(P'||P(0inin)) arbitrarily small but positive and Ep'{X) arbi- 
trarily low. Then, for any small e > 0, a suitable linear combination P of P' 
and P(0mm) satisfies Ep{X) = (A; + A(^min) - e)/9^i^ and /(P||P(0mm)) < e. 



For this P, eq. (19) implies that I(P||Po) < k and the claim V{k) < 



(k + A((9min))/6'min folloWS. TMs prOVeS that V{k) = {k + A(6'min))/6'n 



(20) implies that in Problem ([9|) the supremum is not attained because 
,n)) is strictly positive when £'p(X) < A'(6'min) = Ep(e^in)iX)- □ 
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Remark 1. Consider the convex conjugate of A defined by 

A*{x) :=sup{ex-A{e)), (21) 
e 

which is a convex, lower semicontinuous function on M. Clearly, 

A*{x) = 9x-A{9) if x = A'{e). (22) 

However, for some x perhaps no 9 satisfies x = A' (9). In the generic case, 
when the assumptions (i)-(iii) of Sectionjl] are met, the optimal value at- 
tained in Problem ([T]) is equal to x = A' (9) for 9 satisfying (16), which x is 
the unique solution of 

A*(x) = A; and a; < Sp„(X). (23) 
The proof of Proposition [3] establishes that V{k) always equals the solution 



of (23) when it exists, even if (16) does not have a solution. 



6 A more general framework 

Now we construct a unified framework that covers the choices of F in ([2]) 
when D is an /-divergence or a Bregman distance, as well as others. In this 
framework, T is chosen as a set of probability measures P <C /x (where is 
a given measure on 0, finite or u-finite) of the form 

r = {P < /i : p = dF/dfi satisfies H{p) < k}, (24) 

where H is a, convex integral functional defined as 

H{p) := [ (3{r,p{r))f,{dr), (25) 
Jn 

for measurable, non-negative functions p on Q. Here (3 : Ox (0, -|-oo) — )■ M is a 
mapping such that I3{r, s) is a measurable function of r for each s G (0, -|-oo) 
and a strictly convex function of s for each r S The definition of (3 is 
extended to s < by 

/3(r,0) := lim/3(r,s), /3(r, s) := -hoo if s < 0. (26) 
s4-0 

No differentiability assumptions are made about /? but the convenient no- 
tations f3'{r, 0) and P'{r, +oo) will be used for the common limits of the left 
and right derivatives of /3(r, s) by s as s | resp. s f +oo. Note that 

/3'(r,+oo) = lim (27) 
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With the understandings (26), the mapping /3 : x M — )• (—00, +00] is a 



convex normal integrand in the sense of Rockafellar and Wets [1997| , which 

I9l 



ensures the measurabilitjrl of the function (3{r,p{r)) in (25) and of similar 
functions later on, as in (|36]) and pS] ). 

Depending on the choice of /3, H{j)) will be relative entropy to Pq) some 
Bregman distance, some /-divergence, or some other divergence, as in Sec- 
tion [7] below. Our general assumption about the relation of /3 and the best 
guess distribution Pqi always satisfied in the above cases, will be that the 
minimum of H(p) among probability densities p is attained for pQ, the den- 
sity of Po; without any loss of generality, this minimum is supposed to be 0, 
thus 

H{p) > H{pq) = whenever J pdfi = 1. (28) 
In addition, we assume that Ef^ {X) = J Xp^dfj, exists and 

m := fi-ess inf(X) < bo := Ep^{X) < M := //-ess sup(X). (29) 

Relation of the model risk problem and the moment problem The 

distribution model risk ([T]) with F as in ( 24 ) is evaluated by solving the worst 
case problem 

inf [ Xpdn =: V{k) (30) 

p: f pdfi=l,H{p)<k J 

and then taking MR = —V{k). Our goal is to determine V{k), and also the 
minimiser (the density of the worst case scenario in F), if V{k) is finite and 



the minimum in (30) is attained. If this minimiser exists, it is unique, by 



strict convexity of /3. 



Problem (30) is related to the the moment problem 



inf Hip) =: Fib) (31) 

p: j pdii=l,j Xpdii=b 

in analogy to the relation between problem ([9]) and the maximum entropy 
problem ( 10 ) described in Section |4j Denote 



fcmax := limF(6). 

Proposition 4. Supposing 

0<k< A^max, (32) 
there exists a unique b with m < b < bQ and 

Fib) = k, (33) 



^Measurability issues will not be entered below. For the measurability of functions we 
deal with, see references in |Csiszar and Matus|[2012| to the book of [Rockafellar and Wets| 



1997 



16 



and then the solution to problem, (|30|) has the value 



V{k) = b. 



(34) 



The minimum in (30) is attained if and only if that in (31) is attained (for 
this b), in which case the same p attains both minima. 

Proof. As the convex function F attains its minimum at bo, the assump- 
tion ( 32 ) trivially implies the existence of a unique b satisfying ( 33 ) . More- 



over, then each t E {b,bo) satisfies F{t) < k, hence there exist functions p 
with J pdfi = 1, J Xpd^ = t such that F{t) > k. This proves that V{k) < b. 
On the other hand, F{t) > k t € (m, b) (hence also F{m) > k if m is 
finite), which means that the conditions J pdjj, = 1 and J Xpdfj, = t imply 
H{p) > F{t) > k for each t S (—00,6). Since / Xpdn > —00 if H{p) < 00, 
as verified later (Corollary [3] of Theorem [2]), this proves that V{k) > t. The 
last assertion of the Proposition follows obviously. □ 



Remark 2. The condition (32 ) in Proposition |4] covers all interesting values 
of k. Indeed, one easily sees that iik > fcmax or A; > /cmax > then V{k) = m, 
while clearly V{0) = 60 • This also means that the functional H can be 
suitable for assigning model risk only if femax > 0. A necessary and sufficient 
condition for A^max > 0, analogous to condition (ii) in Section [4| will be given 
in Corollary [2] of Theorem [2| Note that if m = —00 then A:max > implies 

krcif 



00, in which case each A; > meets condition (32) 



For technical reasons, it will be convenient to regard F(b) as the instance 
= 1 of the function 



J(a, b) := inf 

p: J pdii=a,J Xpdfj,= 



Hip), {a,b)e 



(35) 



Problem ( 35 ) is a special case of minimising convex integral functionals un- 
der moment constraints, which has an extensive literature. For references, 
see the recent work of Csiszar and Matiis 2012 , relied upon here also for 



results that date back much earlier, perhaps under less general conditions. 
The results in Csiszar and Matiis [2012] will be used (without further men- 
tioning this) with the choice (/> : r — >• (1, X(r)) of the moment mapping when 
the "value function" there reduces to the function J here. Many results in 
that reference need a condition called dual constraint qualification which, 
however, always holds in the current setting, namely, the set G defined in 



(39) is non-empty (see the passage following (39)). 



The role of the function A in Section [4] will be played by the function 



K{ei,92) 



(3*{r,9i+e2X{r))fi{dr), {61,62) ^ 



(36) 
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where /3* is the convex conjugate of /3 with respect to the second variable, 



/3*(r, x) := sup (xs — /3(r, s)) , xG 



(37) 



The properties of (3 imply that f3* (r, x) is a convex function of x which is 
finite, non-decreasing, and differentiable in the interval {— oo, f3'{r, +00)), 
see (27). At x = /3'(r, +00), if finite, /3*(r,x) may be finite or -|-c«. The 



derivative (/3*)'(r, x) equals zero for x < /3'(r, 0), is positive for /3'(r, 0) < 
X < (3'{r, +00), and grows to +00 as x f P'{r, +00). 

The following functions on will play the role of the exponential family, 
but are parametrised by two variables and need not integrate to 1: 



pe{r) := (/3*)'(r, + ^2^(0), 



(38) 



wher 



e:={9: K{9i,e2) < +00, 61 + 02^(r) < /3'(r, +00) /x-a.e.} . (39) 

The properties of /3* stated above imply for any {61,62) in the effective 
doma_in domK := {(^1,^2) : K{6i,62) < +00} of K that {6^,62) G 9 for 
each ^1 < 61. In particular, contains the interior of dom i^. If /3'(r, +00) = 
+00 /i-a.e. then = domK. As verified later, see Remark 4, the default 
density po is equal to p^e^fi) for some 60 with (^0;0) G 0. 
The function K is equal to the convex conjugate of J: 

K{di,62) = ri6i,62):= sup {eia + 62b-Jia,b)), (40) 

(a,fe)eIR2 



see Csiszar and Matus 

■ '■ FT 

semicontmuous propei[^ 
interior of dom K, and 



2012, Theorem 1.1]. In particular, X is a lower 



convex function. Also, K is differentiable in the 



VK{6) =(^j P0dfi,j Xpgdii^ , 6 = (61,62) G intdomJsT, (41) 



see ICsiszar and Matiis 2012, Corollary 3. 



Main results We calculate b satisfying (33), which by (34) amounts to 



solving problem ( 30 ) , by evaluating instead of J the function K* , using the 
identity J* = K which implies (Rockafellar [1970 Theorem 12.2]) 



J {a, b) = K*{a, 6), (a, b) G int dom J. 



(42) 



^"The definition ( |39[ l makes sure that the derivative in ( 38 1 exists for /i-a.e. r G H if 
(Si, ^2) G 6- For all other r G 57, if any, one may set by definition pei,e2 = 0. 
^^I.e., it never equals —00 and is not identically +00. 
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K* is the convex conjugate of K, 



K*{a,b):= sup {0ia + - K {61,02)) , ia,b) £ 



(43) 



and the interior of the effective domain of J is, by Csiszar and Matiis [2012 
Lemma 6.6] 



int dom J = {{a,b) : a > 0, am < b < aM}. 



(44) 



Proposition |4] and ( 42 ) , ( [44| ) imply for < < kjaa.x the analogue of Re- 
mark [T| A (unique) b satisfies 



K*{l,b) = k and b < bo = Ep^,{X), 



(45) 



and then V{k) = b. This already provides a recipe for computing V{k). In 
regular cases, a more explicit solution is available, based on the following 



key result about Problem (35), see Csiszar and Matiis, 2012, Lemma 4.4, 
Lemma 4.10]: 



Lemma 1. // ( 



,6*2) G G satisfies 



pe dfi = a, / XpQ dn = b 



(46) 



then it attains the maximum in (43). Moreover, in case {a,b) G int dom J, 



the existence of 9 £ @ satisfying (46) is necessary and sufficient for the 



attainment of the minimum in (35), and then p = pg is the (unique) min- 
imiser. 



Theorem 1. Assuming (28), (29), (32), if 



2<0, p^d^i = l, 9i + 92 Xp^dn-K{9) = k 



for some 6 



'1, 



^2) G then the value of the inf in (30) is 



V{k) 



Xp7)dfi. 



(47) 



(48) 



Essential smoothnesi^ of K is a sufficient condition for the existence of 



such 6. Further, a necessary and sufficient condition for p to attain the 



minimum in (30) is p = Pg for the 9 £ Q satisfying (47). 



^■^A lower semicontinuous proper convex function is essentially smooth if its effective 
domain has nonempty interior, the function is differentiable there, and at non-interior 
points of the effective domain the directional derivatives in directions towards the interior 
are —00. The latter trivially holds if the effective domain is open. 
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Corollary 1. // the equations 
d 



(49) 



have a solution 6 = (^1,6*2) £ int dom K with ^2 < then satisfies (47) 



and the solution to Problem (30) equals 

dK{e) 



V{k) 



(50) 



The Corollary follows from the Theorem because, for 9 G int dom K, 



the equations in (47) are equivalent to those in (49), by (41). However, if 



K is not essentially smooth, 9 € int dom K is not a necessary condition for 



(47) 



Proof. By Lemma [Tj if ^ = (^i, 6*2) G satisfies 

pedfi = 1, / Xpgdfi = b 



(51) 



then it attains the maximum in (43). It follows, using (42), that (47) implies 



for b := J Xpgdfi, if it satisfies m < b < M, that 

F{b) = J{l,b) = 9i + 92b- K(9) = k. (52) 
Due to Proposition El to prove (|48|) it remains to show that m < b < b^. 



Clearly, k < fcmax implies m < b. Further, (43) and (^52| imply 



F{t) = K*{l,t) >9i + 92t-K{9i,92) = F{b)+92{t-b), t£{m,M). (53) 
Since 6*2 < 0, this shows that F(t) > F{b) if t G {m,bo), completing the 



proof of (48). 



Suppose next that K is essentially smooth. Then to 6 in (33) there exists 
9 G int dom K with 

{l,b) = VK(9), (54) 

because {l,b) £ int dom J and the gradient vectors of the essentially smooth 
K cover int dom K* = int dom J, see Rockafellar [1970 Corollary 26.4.1]. 
Clearly, (54) implies that 9 attains the maximum in (43), hence it satisfies 



(52). This means by (54) that 9 satisfies the equations in (49), equivalent 



to those in (47). It remains to show that 92 < 0, but this follows from (53) 
applied to t = 6o- 

Finally, the last assertion of Theorem [T] follows from Proposition [4] and 
Lemma [TJ □ 
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Conditions for fcmax > 0. In Proposition |4] and Tiieorem [T] the condition 
^rnax > has been assumed. In this subsection we give a necessary and 
sufficient condition for this to hold. We begin with a remark. 



Remark 3. A simpler instance of Csiszar and Matiis, 2012, Lemma 4.10] 
than Lemma [T| namely with the constant mapping r — t- 1 taken for the mo- 
ment mapping (/>, gives the following: the necessary and sufficient condition 
for p to minimise H{p) subject to J p dfi = a {a > 0) is that p{r) = (/3*)'(r, 9) 
for some 9 £ M with f3* (r, 9) /i-integrable, and then the minimum is equal 
to a9 — f (3*{r, 9) dfj,{r). This establishes the claim that the default density 
po, minimising H(p) subject to J pdfi = 1, equals P(eofl) for some 9q with 
(6*0, 0) G G; this 9o also satisfies 9o - J /3*{r, 9o) d^i{r) = H{po) = 0. 



Theorem 2. Assuming (28), (29), forb < bo we have F{b) > if and only if 
there exists 9 = (^1,^2) G douiK with 92 < 0. (55) 
Proof. To prove the necessity of (ISSl), we may assume m < b < bo- Then 



(1,6) G int dom J, see (44), hence the convex function J has nonempty 
subgradient at (1,6) |Rockafellar[ |1970[ Theorem 23.4]. As J* = K, if 



L,02) belongs to that subgradient then 

F{b) = .7(1, b) = 9i + 92b - K(9) 



(56) 



by Rockafellar 1970| Theorem 23.5], which implies as in the proof of The- 
orem[l]that this 9 also satisfies (53). In turn, (53) with t = bo implies that 
92 < 0, with the strict inequality if F{b) > 0. This proves the necessity of 



(155|). 

For sufficiency, suppose that F{b) = for some b ^ bo, m < b < M. By 
Remark[3| then F{b) = = 9o-K{9o,Q), hence 9* := {9o,0) IS a maximiser 
of g{9) := 9i + 92b - K{9), see ([42]), @. It follows that for n o g £ domK 
can the directional derivative g'{9*;9 — 9*) be positive. By Csiszar and 



Matiis, 2012, Lemma 3.6, Remark 3.7], this directional derivative is equal to 



{9i - 9o) + 92b- J [h -9o + 92X)pe, d^i = ^2(6 - 60 

Thus, the existence of G domi^ with ^2 < rules out 6 < 69, 
sufficiency part of the Theorem. 



proving the 

□ 



Corollary 2. Condition (55) is necessary and sufficient for /cmax 

> 0. Suf- 



ficient conditions are the finiteness of m or the essential smoothness of K. 



Proof. If m is finite then each ^2 < satisfies condition (55) with some 



9i. Indeed, since 9i + 92X < 9i + 92m n-a.e., if the right hand side is less 
than 9o in Remark [s] then (^1,^2) £ domK. If K is essentially smooth 
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then condition (55) holds because int dom K contains 9* = (^QjO). Indeed, 



otherwise the directional derivatives of K at 6* in directions towards interior 
points were equal to — oo, and 9* could not maximize 9i + ^2^0 ~ K{9). □ 

Corollary 3. ///c 

max > then J pdfi — 1, H(p) < +00 imply J Xpdfi > —00. 

Proof. Substitute in the Fenchel inequality xs < /3(r, s) + (3*{r, x) (a conse- 
quence of (37)) X := 9i + 92X{r), s := p{r) and integrate. It follows that if 
(^1, ^2) £ domi^ and p satisfies the hypotheses then 

9i + 92 J Xpdii < H{p) + K{9i, 92) < +00. 



Taking (01,6*2) as in (55), the assertion follows. 



□ 



7 MaxLoss over Bregman balls and /-divergence 
balls 

We now come back to the more specific choices ([2]), where F is a ball of 
distributions in terms of some divergence D, centered at some Pq- 



Relative entropy balls Let us briefly check how the unified framework 



leads, in the special case of relative entropy balls, to the results of |Breuer 
and Csiszar 2012, Theorem 1] reported in Section |4j 

Set = Po and take /3(r, s) := /(s) := s log s - s + 1. Then P*{r,x) = 
f*{x) = exp(x) — 1 and I3*'(r,x) = (/*)'(x) = exp(x). Hence, using (13) 



and (36), 



K{9i,92) = J {exp{9i + 92X) - 1) dPo = exp{9i + A(02)) - 1 



and = dom K = M. x dom A. The functions pe, E B of ( |38[ ) are of form 
exp{9i + 92X{r)), and integrate to 1 if and only if 9i = — A(02)- Then pg is 
the Po-density of P(02) in the exponential family (12). 

The first equation in (47) requires pg to be a density, thus 9i = — A(02)- 
Then f XpgdFo = A'{92), see (14), and the second equation in (47) reads 
-A(6'2) + 6'2A'(6l2) = k, which is (|l6j). Thus Theorem [l] gives the result in 



Section [4] that if ( |14[ ) has a negative solution 9 then V{k) = A.' (9), a worst 
case scenario exists, and its density is 



/-divergence balls Setting /i = Pq again, take now any autonomous 
integrand for /3 given by a convex function / as in Section [3j and let 



H{p) 



f{p)d¥o. 



(57) 
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Then the set F of distributions given by (24) is equal to the /-divergence 
ball {P : Df(F\\Fo) < k} if f is cofinite, while if /'(+oo) := lim.^oo f{s)/s 
is finite, F is a proper subset of that ball. We will focus on T defined by 



(24) anyway. 

If / is not cofinite then f*{x) = +00 for x > f'{+oo), hence 



K{eue2) = J r{ei + 92X)dFo 



is infinite when 62 < 0, unless m := essinf(X) is finite. By Corollary |3] of 
Theorem[2| this means that the functional (57) can be adequate for assigning 



model risk only if / is cofinite or if X is essentially bounded below. In 
the latter case, (^1,^2) with ^2 < belongs to int dom K if and only if 
01 + 92m < /'(+00). 

The most poular /-divergences are the power divergences, defined by 

/„(s) := [s'^ - a{s - 1) - l]/[a{a - 1)], a G M. 

Formally, fa is undefined if a = or a = 1, but the definition is commonly 
extended by limiting, thus 

fo{s) := logs + s - 1, /i(s) := slogs - s + 1. 

This means that also L>/o(P||Po) = /(Po||P) and Df^{F\\Fo) = /(P||Po) are 
regarded as power divergences. Note that the function /„ is cofinite if and 
only if a > 1, and f'^{+oo) = 1/(1 — a) if a < 1. 
Let us determine the family of functions 

Pe{r) = {rJ{ei + e2X{r)), 9 = {01,62)^0 (58) 

that contains the worst case densities in power divergence balls, more ex- 



actly, in (24) with / = fa- Since /^(s) = [s"^^ — l]/(a — 1) grows from 
—00 to 1/(1 — a) if a < 1 or from 1/(1 — a) to +00 if a > 1, as s runs over 
(0,+oo). In the interval (—00, 1/(1 — a)) or (1/(1 — a), +00), respectively, 
(/*)' is the inverse function of fa, thus 

(/*)'(2;) = [x(a - 1) + l]i/("-i) if a<l, x<l/{l-a) 

or a > 1, a; > 1/(1 — a). 

Clearly, (/q)'(x) does not exist if a < 1 and x > 1/(1 — a), while if a > 1 
and X < 1/(1 — a) then (/q)'(x) = 0. This gives a simple formula for the 



functions pg in ( 58 ) . Unlike for the relative entropy case, however, no explicit 
condition is available for / pedFo = 1, and the two equations in Theorem [l] 



cannot be reduced to one. 
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Bregman balls In the special case = Fq, the Bregman distance ([6]) 
reduces to /-divergence: If / is a non-negative convex function with /(I) = 
and differentiable at s = 1 then Aj(s, 1) = f{s), consequently 



Bf 



D, 



for P < 



Hence, in this subsection, is taken different from ¥q; for simplicity, / is 
assumed differentiable. To obtain for D in Q resp. i7 in (25) the Bregman 
distance -B/,^ of Q, we choose the non- autonomous integrand 

s) = f{s) - f{po{r)) - f'{po{r)){s - po{r)). 

To make sure that this meets the assumptions on (3, in case /'(O) = — oo we 
assume that the default density po is /i-a.e. positive; this assumption is not 
needed if /'(O) > -oo. 

By Csiszar and Matiis [2012 Lemma 2.6], the convex conjugate of (3 
with respect to s equals 

r(r,x) = r{x + f'ipoir))) - nfipoir))). 



The function K from (36) equals 

K{9) := [ [f*(ei + 92X{r) + f'{po{r)))-nfipo{r))]di^{r). 
Jn 

The family {pg{r) : 9 G 0} is formed by the (non- negative) functions 

pe{r) = ^*'{9i + 92X{r)) = r'[9i + 92X{r) + f'{po{r))]. 

Note that while the case of Bregman balls is covered by our general re- 
sults, it is not apparent that the current special form of /3 would substantially 
simplify their application. 



8 Evaluation of divergence preferences 



Finally, we briefly address divergence preferences, i.e., the problem ([3| 
which, in the framework of Section [HI is simpler than the minimization of 



H{p) over the set (24). Divergence preferences include as special case the 

2001 , when we choose the 



multiplier preferences of Hansen and Sargent 



relative entropy / for D. Maccheroni et al. 



2006 



choose for D the more 



general weighted /-divergences 



-|-oo 



-(r)jtflPo(r) ifP<Po, 
otherwise, 



(59) 



where u; is a normalised, non-negative weight function. 
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Below, more generally, the role of D is given to any convex functional as 
in (25). Introducing a new convex integrand and intergal functional by 



/3(r, s) := X{r)s + A/3(r, s), H{p) := / p{r,p{r))d^l{r) 
(where A > is fixed), we can write 



W := inf 

p:Jpdfi=l 



Xpdfl + XH{p) 



inf H{p). (60) 

p:Jpdfi=l 



Thus, the problem is to minimize the functional H{p) under the single con- 
straint J pdfi = 1. 



In analogy to ( 35 ) , consider 



J(a) := inf H{p), a G M. 

p:f pdfi=a 



Note that /3 meets the basic assumptions on /3 (though (28) does not hold 
for H), and that 



0)*{r, x) = sup [xs - X{r)s - Xf3{r, s)] = Xf3* r. 



x-X{r) 
A 



It follows by [Csiszar and Matus 2012, Theorem 1.1] that the convex con- 



jugate of J equals 

K{e) := l0nr,e)dfi{r) = X j r 



Xir] 



dfi{r) 



or, with the notation (36), 



r(9) = K(e) = XK{-,--), 9eR. 

As the interior of dom J is (0, +oo), it follows that J (a) = K*{a) for each 
a > 0. In particular, 



W = J{1) = k*{l) = sup (^ - iv:(6')) = sup 



9-XK 



e_ _i 

A'"A 



(61) 



A sup 



X 



Proposition 5. The necessary and sufficient condition for W > — oo in (60) 
is the existence of Oi G M with 



L, -1/A) G dom K, 



(62) 
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and then 



If for some 9 



W 



A sup [^1 



m,-l/A)] 



(63) 



6i,—l/X) as in (62) the function po in (38) has integral 
equal to one, then 6i attains the maximum in (63), and p = pg attains the 
minimum in (60). Otherwise, among the numbers 9i satisfying (62) there 
exists a largest one 01 max; and pg with 9 — (^imax) — l/'^) has integral less 
than one; then 9i = ^imax attains the maximum in (|63|). 



Proof. Clearly, W = J(l) > — oo if and only if J never equals — oo, thus its 
conjugate K is not identically +oo; by the formula for K, this proves the 
first assertion. The second assertion follows from (61). As the supremum 



in (63) is the same as the supremum defining K*(l) in (61 ) (with 9/X substi- 



tuted by 9i), the next assertion follows from the simple instance of ICsiszar 



and Matus, 2012, Lemma 4.10] used in Remark [s] (note that the function 
(/3*)'(r, 0) there, replacing /3 by /? and 6 by 6iX, gives the function pg in 
the Proposition). For the last assertion, recall that the maximum in the 
definition of K*(l), and therefore in (63), is always attained, because a = 1 
is in the interior of domK* (as in Remark [3|. Then the (left) derivative by 
01 of K(9i, — 1/A) at the maximiser, say 9^, has to be < 1, and the strict 
inequality can hold only if 9* — ^imax' 

As the mentioned derivative equals 
the integral of pg* with 9* = {91, —1/ X), this completes the proof. □ 



Evaluation of multiplier preferences As an example apply Proposi- 
tion [5] to reproduce a result of Hansen and Sargent [2001 . We evaluate the 
objective function of an agent with multiplier preferences ([3| choosing for D 
the relative entropy. This corresponds to the choice /3(r, s) = s log s — s -|- 1, 
and fi = Fq. In this case, the condition for W > —00 in Proposition [5] 
becomes — 1/A S dom A. Under that condition, the function pg with 



i.-l/A), 



has integral equal to one, hence the Proposition gives that this pg, namely 
the member P(— 1/A) of the exponential family with parameter value —1/A, 
attains the minimum in the definition (60) of W. It also follows that 



W = X9i = -AA(-i). 

A 
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